A generalized LR parser for text-to-speech synthesis

نویسنده

  • Per Olav Heggtveit
چکیده

The development of a parser for a Norwegian text-to-speech system is reported. The Generalized Left Right (GLR) algorithm [1] is applied, which is a generalization of the well known LR algorithm for parsing computer languages. This paper describes briefly the GLR algorithm, the integration of a probabilistic scoring model, our implementation of the parser in C++, attribute structures, lexical interface, and the application of the parser to part-of-speech (POS) tagging for Norwegian. Applied to a small test set of about 4 000 words this method correctly tags 96 % of the known words, which is close to the performance of other POS-taggers trained on large text databases [2] [3]. 85 % of the unknown words are tagged correctly, and the probability of choosing the wrong pronunciation of a word from lexicon is less than 0.1 %.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Experimental Real - Time Speech - to - Speech Translation System *

This paper reports the current progress in the SPEECHTRANS project at the Center for Machine Translation which is a speech-to-speech translation project for real-time processing of speaker-independent noisy continuous speech input. SPEECHTRANS uses a custom speech recognition hardware and a phoneme-based generalized LR parser that uses a unification-based grammar formalism and a natural languag...

متن کامل

Robust Parsing of Noise Contaminated and Extra-grammatical Input: a Grammar Focused Approach

This thesis tackles the problem of parsing noise contaminated input by identifying and parsing the maximal subset of the input string that is found to be grammatical. I develop a parser that is based on the Generalized LR Parsing paradigm and performs this task eeciently. Since the parser uses the grammar to identify the meaningful words of the input, it can be viewed as a focusing tool. The pa...

متن کامل

Bilingual aligned corpora for speech to speech translation for Spanish, English and Catalan

In the framework of the EU-funded Project LC-STAR, a set of Language Resources (LR) for all the Speech to Speech Translation components (Speech recognition, Machine Translation and Speech Synthesis) was developed. This paper deals with the development of bilingual corpora in Spanish, US English and Catalan. The corpora were obtained from spontaneous dialogues in one of these three languages whi...

متن کامل

Empirical Support for Probabilistic GLR Parsing

This paper discusses the e ectiveness of a new probabilistic generalized LR model (PGLR) in word-based parsing (morphological and syntactic analysis) tasks, in which we have to consider the word segmentation and multiple part-of-speech problems. Parsing a sentence from the morphological level makes the task much more complex because of the increase of parse ambiguity stemming from word segmenta...

متن کامل

Connectionist and Symbolic Processing in Speech-to-Speech Translation: The JANUS System

We present JANUS, a speech-to-speech translation system that utilizes diverse processing strategies including connectionist learning, traditional AI knowledge representation approaches, dynamic programming, and stochastic techniques. JANUS translates continuously spoken English utterances into Japanese and German speech utterances. The overall system performance on a corpus of conference regist...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1996